UCell signature enrichment - interacting with Seurat
In this demo, we will apply UCell to evaluate gene signatures in single-cell PBMC data. We will use a subset of the data from Hao and Hao et al, bioRvix 2020, which comprises multiple immune cell types at different levels of resolution. Because these cells were characterized both in terms of transciptomes (using scRNAseq) and surface proteins (using a panel of antibodies), the cell type annotations should be of very high quality. To demonstrate how UCell can simply and accurately evaluate gene signatures on a query dataset, we will apply it directly to the Seurat object from Hao and Hao et al. and compare the signature scores to the original cluster annotations by the authors.
Installation
Install UCell and dependencies
Query single-cell data
Obtain a downsampled version of the data from Hao and Hao et al, bioRvix 2020 at the following link: https://drive.switch.ch/index.php/s/3kM5PQ0tQaG6d6A – 20,000 T cells
Then load the object and visualize the clustering annotation by the authors.
pbmc.Tcell <- readRDS("pbmc_multimodal.downsampled20k.Tcell.seurat.RNA.rds")
DimPlot(object = pbmc.Tcell, reduction = "wnn.umap", group.by = "celltype.l2", label = TRUE,
label.size = 3, repel = TRUE)Define some signatures for T cell subtypes
markers <- list()
markers$Tcell_CD4 <- c("CD4", "CD40LG")
markers$Tcell_CD8 <- c("CD8A", "CD8B")
markers$Tcell_Treg <- c("FOXP3", "IL2RA")
markers$Tcell_MAIT <- c("KLRB1", "SLC4A10", "NCR3")
markers$Tcell_gd <- c("TRDC", "TRGC1", "TRGC2", "TRDV1")
markers$Tcell_NK <- c("FGFBP2", "SPON2", "KLRF1", "FCGR3A", "KLRD1", "TRDC")Score signatures using UCell
pbmc.Tcell <- AddModuleScore_UCell(pbmc.Tcell, features = markers)
signature.names <- paste0(names(markers), "_UCell")
VlnPlot(pbmc.Tcell, features = signature.names, group.by = "celltype.l1")How do signatures compare to original annotations
Idents(pbmc.Tcell) <- "celltype.l2"
DimPlot(object = pbmc.Tcell, reduction = "wnn.umap", group.by = "celltype.l2", label.size = 3,
repel = TRUE, label = T)Compare to AddModuleScore from Seurat
AddModuleScore from Seurat is very fast, but the score is highly dependent on the composition of the dataset. Here we will apply AddModuleScore with a simple CD8 T cell signature to two datasets: a set composed of different T cell types (pbmc.Tcell) and a subset of this dataset only comprising the CD8 T cells (pbmc.Tcell.CD8).
First, generate a subset only comprising CD8 T cells (pbmc.Tcell.CD8)
Idents(pbmc.Tcell) <- "celltype.l1"
pbmc.Tcell.CD8 <- subset(pbmc.Tcell, idents = c("CD8 T"))
DimPlot(object = pbmc.Tcell.CD8, reduction = "wnn.umap", group.by = "celltype.l2",
label = TRUE, label.size = 3, repel = TRUE) + NoLegend()Note that applying the same signature to the complete set or to the CD8 T subset gives very different results. When other cell types are present, the score distribution for CD8 T cells has a median close to 1, but the same CD8 T cell evaluated alone give a zero-centered distribution of scores. It may be undesirable to have a score that changes so dramatically for the same cells depending of the composition of the dataset.
set.seed(123)
markers.cd8 <- list(Tcell_CD8 = c("CD8A", "CD8B"))
pbmc.Tcell <- AddModuleScore(pbmc.Tcell, features = markers.cd8, name = "Tcell_CD8_Seurat")
a <- VlnPlot(pbmc.Tcell, features = "Tcell_CD8_Seurat1")
pbmc.Tcell.CD8 <- AddModuleScore(pbmc.Tcell.CD8, features = markers.cd8, name = "Tcell_CD8_Seurat")
b <- VlnPlot(pbmc.Tcell.CD8, features = "Tcell_CD8_Seurat1")
a | b Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.6057 0.5149 0.9236 0.8756 1.2673 2.3228
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.65105 -0.44921 -0.03485 -0.09280 0.30758 1.39551
UCell score is based on gene rankings and therefore is not affected by the composition of the query dataset
pbmc.Tcell <- AddModuleScore_UCell(pbmc.Tcell, features = markers.cd8)
a <- VlnPlot(pbmc.Tcell, features = "Tcell_CD8_UCell")
pbmc.Tcell.CD8 <- AddModuleScore_UCell(pbmc.Tcell.CD8, features = markers.cd8)
b <- VlnPlot(pbmc.Tcell.CD8, features = "Tcell_CD8_UCell")
a | b Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.3859 0.5280 0.5367 0.7803 0.9360
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.3840 0.5347 0.5375 0.7820 0.9377
Idents(pbmc.Tcell) <- "celltype.l1"
DimPlot(object = pbmc.Tcell, reduction = "wnn.umap", group.by = "celltype.l2", label = TRUE,
label.size = 3, repel = TRUE)FeaturePlot(pbmc.Tcell, reduction = "wnn.umap", features = c("Tcell_CD8_UCell", "Tcell_CD8_Seurat1"),
ncol = 2, order = T)Idents(pbmc.Tcell.CD8) <- "celltype.l2"
DimPlot(object = pbmc.Tcell.CD8, reduction = "wnn.umap", group.by = "celltype.l2",
label = TRUE, label.size = 3, repel = TRUE) + NoLegend()FeaturePlot(pbmc.Tcell.CD8, reduction = "wnn.umap", features = c("Tcell_CD8_UCell",
"Tcell_CD8_Seurat1"), ncol = 2, order = T)Further reading
For more examples of UCell functionalities see the Basic Tutorial (add link)
The code is available at the UCell GitHub repository